Improved similarity scores for comparing motifs

نویسندگان

  • Emi Tanaka
  • Timothy L. Bailey
  • Charles E. Grant
  • William Stafford Noble
  • Uri Keich
چکیده

MOTIVATION A question that often comes up after applying a motif finder to a set of co-regulated DNA sequences is whether the reported putative motif is similar to any known motif. While several tools have been designed for this task, Habib et al. pointed out that the scores that are commonly used for measuring similarity between motifs do not distinguish between a good alignment of two informative columns (say, all-A) and one of two uninformative columns. This observation explains why tools such as Tomtom occasionally return an alignment of uninformative columns which is clearly spurious. To address this problem, Habib et al. suggested a new score [Bayesian Likelihood 2-Component (BLiC)] which uses a Bayesian information criterion to penalize matches that are also similar to the background distribution. RESULTS We show that the BLiC score exhibits other, highly undesirable properties, and we offer instead a general approach to adjust any motif similarity score so as to reduce the number of reported spurious alignments of uninformative columns. We implement our method in Tomtom and show that, without significantly compromising Tomtom's retrieval accuracy or its runtime, we can drastically reduce the number of uninformative alignments. AVAILABILITY AND IMPLEMENTATION The modified Tomtom is available as part of the MEME Suite at http://meme.nbcr.net.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparing protein sequence-based and predicted secondary structure-based methods for identification of remote homologs.

We have compared a novel sequence-structure matching technique, FORESST, for detecting remote homologs to three existing sequence based methods, including local amino acid sequence similarity by BLASTP, hidden Markov models (HMMs) of sequences of protein families using SAM, HMMs based on sequence motifs identified using meta-MEME. FORESST compares predicted secondary structures to a library of ...

متن کامل

SArKS: Discovering gene expression regulatory motifs and domains by suffix array kernel smoothing

Experiments designed to assess differential gene expression represent a rich resource for discovering how DNA regulatory sequences influence transcription. Results derived from such experiments are usually quantified as continuous scores, such as fold changes, test statistics and p-values. We present a de novo motif discovery algorithm, SArKS, which uses a nonparametric kernel smoothing approac...

متن کامل

Metrics for comparing regulatory sequences on the basis of pattern counts

MOTIVATION Upstream sequences contain short motifs, which mediate transcriptional regulation by specifically binding different transcription factors. The presence of common motifs in the regulatory regions of two genes might be considered as a clue for a potential co-regulation. A pattern count-based (dis)similarity metric between sequences could thus be used to classify genes according to thei...

متن کامل

A Novel Alignment-Free Method for Comparing Transcription Factor Binding Site Motifs

BACKGROUND Transcription factor binding site (TFBS) motifs can be accurately represented by position frequency matrices (PFM) or other equivalent forms. We often need to compare TFBS motifs using their PFMs in order to search for similar motifs in a motif database, or cluster motifs according to their binding preference. The majority of current methods for motif comparison involve a similarity ...

متن کامل

MACRO-PERFECTOS-APE — MAtrix CompaRisOn & PrEdicting Regulatory Functional Effect of SNPs by Approximate P-value Estimation

Here we present MACRO-APE and PERFECTOS-APE software designed for practical sequence analysis involving classic mononucleotide and dinucleotide position weight matrices (PWMs) of DNA sequence patterns often called motifs. The common usage case for DNA motifs is representation of transcription factor binding sites. The software allows (1) comparing different PWMs using a variant of Jaccard simil...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 27 12  شماره 

صفحات  -

تاریخ انتشار 2011